arXiv:1501.01817v2 [cs.DB] 12 Jan 2015 


The Hunt for a Red Spider: 
Conjunctive Query Determinacy Is Undecidable 


Tomasz Gogacz, Jerzy Marcinkowski, 

Institute of Computer Science, University Of Wroclaw, 

January 8th, 2015 


Abstract —We solve a well known, long-standing open prob¬ 
lem in relational databases theory, showing that the conjunctive 
query determinacy problem (in its ’’unrestricted” version) is 
undecidable. 

I. Introduction 

Imagine there is a database we have no direct access to, 
but there are views of this database available to us, defined 
by some conjunctive queries Qi, Q 2 ,---Qk- And we are 
given another conjunctive query Qq. Will we be able to 
compute Qq only using the available views? The answer 
depends on whether the queries Qi, Q 2 ,---Qk determine 
query Qq. To state it more precisely, the Conjunctive Query 
Determinacy Problem (CQDP) is; 


The instance of the problem is a set of conjunctive 
queries Q = {Qi,...Qk}, and another conjunctive 
query Qo- 

The question is whether Q determines Qq, which means 
that for each two structures (database instances) Di and 
]D >2 such that Q(Di) = (5(D2) for each Q € Q, it also 
holds that (5o(Di) = ( 3 o(D 2 )- 


The technical result of this paper is; 

Theorem 1: CQDP is undecidable. 

It is hard to imagine a more natural problem than CQDP, 
and better motivated. Answering queries using views appears 
in various contexts, see for example IHalOll for a survey. 
Or IIDPT99I . where the context is query evaluation plans 
optimization. Or - to see more recent examples - 0FG12I 
where the view update problem is studied and 0FKN13I 
where the context are description logics. It is fair to say 
that many variants of the problem are being considered, and 
the case we study, where both the views and the query are 
conjunctive queries, is not the only possible scenario. But 
it is of special importance as the CQs - as flNSV07l puts 
it - are ’’the simplest and most common language to define 
views and queries” 

As we said it is hard to imagine a more natural problem 
than CQDP. So no wonder it has a 30 years long history 


as a research subject. And this history happens to be quite 
complicated, marked by errors and confusion. 

The oldest paper we were able to trace, where CQDP is 
studied, is IILY85I . whose first sentence is almost the same 
as ours; "Assume that a set of derived relations is available 
in a stored form. Given a query, can it be computed from 
the derived relations and, if so, how?”. It was shown there, 
and in the next paper IIYL87L by the same authors, that 
the problem is decidable if Q consists of just one query 
without self-joins (there is however some additional form 
of selection allowed there, so it is not really comparable 
to the CQ paradigm). Over the next 30 years many other 
decidable cases have been found. Let us just cite the most 
recent results; IINSVIOI shows that the problem is decidable 
if each query from Q has only one free variable; in IIAfrl II 
decidability is shown for Q and Qq being ’’path queries”. 
This is generalized in MPas 1 II to the the scenario where Q 
are path queries but Qq is any conjunctive query. 

As we said in the Abstract, decidability of CQDP was a 
long standing open problem. It was indeed open, since 1985, 
but not without pauses. It was shown in MLMS95I that it is 
decidable whether - for given Q and Q like in CQDP - 
there exists another query Q', over the signature consisting 
of Qi,Q2t ■ - Qk, such that for each structure (database 
instance) D there is (5o(D>) = (5'((5i(ID)), ... QfcC®)) (notice 
that while the ’’input” of Q are the relations of D, which we 
do not have access to, the ’’input” of Q' are the views that 
we are allowed to see). Existence of such Q' - a rewriting 
of Q - indeed implies determinacy. But - and this fact 
was for a long time surprisingly poorly understood - not 
necessarily determinacy implies existence of a rewriting. 
There is no sign in MLMS95I that the authors were aware 
of this distinction, and it seems that the first to realize that 
there is any problem here were the authors of MSV05I . After 
realizing that conjunctive query determinacy and conjunctive 
query rewriting (as above defined) are possibly two different 
notions they show that they are in fact equivalent. Together 
with the result of MLMS95I this would imply decidability 
of CQDP. But - again surprisingly - this proof was not 
correct, as spotted by (a superset of) the authors of MSV05I 
in MNSV07I . Also in MNSV07I a (correct) counterexample is 
presented, of Q and Qq such that Q determines Qq but no 





































rewriting Q' being a CQ exists. In fact - as it is also shown 
in IINSV07I - (5o(D) is not always a monotonic function of 
Qi (©),... Qfc(D). 

Coming back to decidability of the determinacy problem; 
the paper IINSV07I is also the first to present a negative 
result. It was shown there, that the problem is undecidable 
if unions of conjunctive queries are allowed rather than 
CQs. In ONSVIOII it was also proved that determinacy is 
undecidable if the elements of Q are CQs and Qq is a first 
order sentence (or the other way round). Another negative 
result is presented in 0FGZ12I : determinacy is shown there 
to be undecidable if Q is a DATALOG program and Qo is 
CQ. 

In our setting the instance of the problem consists of the 
set Q of the queries that define the views and of the query 
Qq. a natural question to ask would be what happens if 
(5i(D),... Qfc(D) were also part of the input. This problem 
can be easily shown to be decidable. Complexity is studied 
in IIAD98II . 

A. Finite vi. unrestricted case. 

As usually in database theory there are two variants of 
the problem that one can consider: finite, where all the 
structures in question (which in our case means Di and D 2 ) 
are assumed to be finite, and unrestricted, where there is no 
such assumption. Most of the results of IILMS95L IINSV07I . 
ONSVIOI . MAfrl 111 and MPas 11 II that we report above hold 
true regardless of the finiteness assumption. Unlike them. 
Theorem [T] of this paper concerns the unrestricted case only. 
Decidability of CQDP for the finite case remains open. 

OUTLINE 

The rest of the paper is devoted to the proof of Theorem 

ffl 


II. Preliminaries 

In Section III-AI we recall some standard finite model 
theory/database theory notions. They way we present them 
is rather standard. In Sections Ill-BI and III-CI we also recall 
standard notions, but our notations may be seen as slightly 
non-standard (although of course equivalent to standard). 
This is how we think we need them in this paper. 

A. Basic notions 

When we say ’’structure” we mean a relational structure 
D over some signature E, i.e. a set of elements (vertices), 
denoted as Dom{]D)) and a set of relational atoms, whose 
arguments are elements of D and whose predicate names 
are from S. Atoms are (of course) only positive. For an 
atomic formula A we use notation D |= A to say that A is 
an atom of D. 


Apart from predicate symbols E can also contain con¬ 
stants. If c is a constant from E and D is a structure over E 
then c G Zlom(D). 

Di a substructure of D (and D is a superstructure of Di) 
if for each atom A if Di \= A then D j= A. This implies 
that Dom(Di) C Dom(D). 

For two structures Di and D over the same signature 
E a function h : i9om(Di) i9om(D) is called a 
homomorphism if for each P G E of arity I and each tuple 
d G Dom(Dy if Di |= P(d) then D ^ P{h{a)) (where 
h{a) is a tuple of images of elements of a). Notice that Di 
a substructure of D if and only if identity is a homomorphism 
from Di to D. 

A conjunctive query (over E), in short CQ, is a conjunc¬ 
tion of atomic formulas (over E) whose arguments are either 
variables or the constants from E, preceded by existential 
quantifier binding some of the variables. It is very important 
in this paper to distinguish between a conjunctive query 
and its quantifier-free part. We usually write T* or <!> for 
a conjunction of atoms without quantifiers and Q (possibly 
with a subscript) for conjunctive queries, so that we have 
something like: 

Q{x) = 3y 'i>{y,x) 

where T' {y, x) is a formula being a conjunction of atomic 
formulas and a: is a tuple of variables which are free in Q. 

For a conjunction of atoms T' (or for a CQ Q(x) = 
3y 'l^(y,x)) the canonical structure of T', denoted as A[T'], 
is the structure whose elements are all the variables and 
constants appearing in T' and whose atoms are atoms of 
T*. It is useful to notice that for a structure D and a set 
V C Dom(D) there is a unique conjunctive query Q such 
that D = A[Q] and that V is the set of free variables of Q. 

For a CQ Q(x) = By A'(y,x) with x = Xi,...Xi, a 
structure D and a tuple oi,... a; of elements of D we write 
ID) 1= Q(ai,. ■ .ai) when there exists a homomorphism h : 
A[T'] D such that h(xi) = Ui for each i. 

Sometimes we also write ID) |= Q. Then we assume that all 
the free variables of Q are implicitly existentially quantified, 
so that the meaning of the notation is that there exists any 
homomorphism h : A['I'] D. 

The most fundamental definition of this paper now, needed 
to formulate the problem we solve: for a CQ Q and for a 
structure D by Q(ID>) we denote the ’’view defined by Q over 
D”, which is the relation {d : D ^ Q(a)}- 

B. TGDs and how they act on a structure 

A Tuple Generating Dependency (or TGD) is a formula 
of the form; 

Vx,y<^{x,y) => 3z 'i'(z,y) 

where T* and $ are - as always - conjunctions of atomic 
formulas. The standard convention, which we will obey, is 
that the universal quantifiers in front of the TGD are omitted. 

































From the point of view of this paper it is important to see 
a TGD - let it be T, equal to <l>(x, y) ^ 3z '£'( 2 , y) - as a 
procedure whose input is a structure 1) and whose output is 
a new structure being a superstructure of V: 

find a tuple b (with |6| = |y|) such that: 

O 'D\=3x^{x,b) via homomorphism h but 

© 'D^3z ; 

create a new copy of 

output T{'D,b) being a union of T) and 
the new copy of with each y from 

identified with h(jj) in T) . 

The message, which will be good to remember, is that 
the interface between the ’’new” part of the structure, added 
by a single application of a TGD to a structure, and the 
’’old” structure, are the free variables of the query in the 
right hand side of the TGD. 

C. Chase and its universality 

For a structure V and a set T of TGDs let ToDo(T, V) 
be the set of all pairs (&, T) such that T, equal to ^{x, y) ^ 
3z T'(f, y), is a TGD from T and b satisfies conditions O 
and ©. Roughly speaking ToDo(T, V) is the set of tuples of 
elements of V which satisfy the left hand side of some TGD 
in T but still wait for a witness - confirming that they also 
satisfy the right hand side - to be added to the structure. 

A sequence {X>i}ign of structures, for some ordinal 
number fl, will be called fair (with respect to T and D) 
if: 

• Do = ID; 

• for each i > 0 we have Vi = T{[jj^^Vj,b) for some 
(6,T) e ToDo(r,U,<.IP,); 

• for each (6, T) €:ToDo(T, Uj<i T^j) for some i, there is 
k > i such that (6, T) ^ToDo(T, Dj). 

Let be a fair sequence (with respect to T and 

D). Then the structure Chase{T,^) is defined as 
and each of the sets Vi is called a stage of Chase. 

In other words, Chase{T, D) is a structure being result of 
adding, one by one, tuples that witness that some TGD from 
T is satisfied for a given tuple from the current structure. 
The set ToDo always contains tuples that do not have the 
required witnesses yet. Notice that there are two possible 
ways, for a tuple {b,T), to disappear from the set ToDo: 
one is that the TGD T is applied to the tuple b at some 
step. But it may also happen that the witnesses b needs are 
added as a side-effect of other TGDs being applied to other 
tuple 

It may appear, and not without a reason, that the structure 
Chase{T,V)) depends on the ordering in which tuples are 
selected from the ToDo set. But the beautiful fact (and a 
well-known one, 01X821 1 is that, regardless of the ordering: 

* A reader who is aware of the difference between standard and oblivious 
Chase will notice that what we define is the standard/lazy version. 


Theorem 2 (Chase as universal structure): 

• Chase{T,^) ^ T. In other words if 

=> 3z ^(z,y) is a TGD from T 
and Chase (T, D) ^ 3a: $(a:,6) then also 

Chase(r,D) |= 3z T'(z,6) 

• Let M be any superstructure of D such that M ^ T and 
let Q be any conjunctive query such that ChaseifT: D) \= Q. 
Then also M ^ Q. 

Most of the lemmas in this paper, concerning the struc¬ 
ture of Chase(T, D) for specific T and D are proved by 
induction on a respective fair sequence, even if this is not 
always mentioned explicitly. 

D. Thue systems 

Our undecidability proof is by reduction from a variant of 
the Thue systems word problem (also known as semigroups 
word problem). A Thue system is given by a finite symmetric 
relation 11 C yl* x yl* for some finite alphabet A. For two 
words w, w' G A* we define w On w' if and only if 
there are words v,v' G A* and a pair G 11 such 

that w = vtv' and w' = vt'v'. Relation On is defined 
as the transitive closure of On- Various undecidability 
results involving relation On can be proved using standard 
techniques from 0Dav77l . 

III. Green-Red TGDs 

A. Green-Red Signature 

For a given signature E let Eg and E/j be two copies 
of E having new relation symbols, which have the same 
names and the same arities as symbols in E but are written 
in green and red respectively. Let E be the union of Eg and 
E/j. Notice that the constants from E, not being relation 
symbols, are never colored and thus survive in E unharmed. 

For any formula T' over E let R{A!) (or G{A>)) be the 
result of painting all the predicates in T' red (green). For 
any formula T' over E let dalt{A>) (’’daltonisation of T'”) be 
a formula over E being the result of erasing the colors from 
predicates of T'. The same convention applies to structures. 
Whenever an uncolored relation symbol (usually H) is used 
in the context of E it should be understood as ”G(H) or 
R(H)”. 

B. Having ID instead o/IDi and ID 2 . 

We prefer to restate CQDP a little bit in order to be talking 
about one database instance instead of two. Clearly CQDP 
is equivalent to: 






The green-red conjunctive query determinacy prob¬ 
lem (GRCQDP). The instance of the problem is a finite 
set Q of conjunctive queries and another conjunctive 
query Qo, all of them over some signature S. The 
question is whether for each structure D over S such 
that: 

© (G(g))(B) = (i?(Q))(D) for each QeQ 
it also holds that (G(Qo))(D) = (i?((5o))(D). 


For a conjunctive query Q of the form 3a; y) where 
<1) is a conjunction of atoms over S let be the TGD 

generated by Q in the following sense; 

= VS,y[G(<i>)(x,y)^3zi?(<I>)(z,y)] 

TGD js defined in an analogous way. For a set Q 

as above let 7 q be the set of all TGDs of the form 
or Q^~^G Q g Q. It is very easy to see that: 

Lemma 3: The above condition © is satisfied by structure 
D if and only i/D |= 7q. 

Now GRCQDP can be again restated as; 


Given a set Q (as in the original formulation of GR¬ 
CQDP, above), and another conjunctive query Qq, is it 
true that: 

0 for each structure D and each tuple d of elements of 
D, if D ^ Tq, G{Qo){d) then also D |= R{Qo){d) ? 


But 0 means that Tq, G((5o)(d) \= i?(Qo)(a) where d is 
a tuple of new constants. Thus - by Theorem |2- CQDP is 
equivalent tcQ 


CQDP - the green-red Chase version (CQDP-GRC). 

Given the set Q (as in the original formulation, above), 

and another conjunctive query Qq, is it true that: 

ChaseiTa, A[G(Qo)(a)]) h RiQo){d) ? 

where A[G((5o)(d)] is the canonical structure of 
G(Qo)(a). 

The main technical result of this paper is: 

Theorem 4 (Theorem\I\ restated): CQDP-GRC is undecid- 
able. 

Of course the problem to determine, for given set T 
of TGDs, database instance D and query Q, whether 
ChaseifT, D) \= Q, is undecidable in general. But this does 
not a priori mean that CQDP-GRC is undecidable, since 
the TGDs we allow here are of very special green-red form 

^ The observation that determinacy can be semi-decided using chase is 
not ours and is at least as old as INSV07I . The difference is that in INSV07I 
they prefer to see two separate structures rather than two colors. 


(with the head being just recoloring of the body) and since 
we only consider Q being a recoloring of D. 

C. Idempotence lemma 

One useful feature of the green-red TGDs is described in 
the following easy lemma; 

Lemma 5: Let Q be a set of conjunctive queries and let the 
set Tq of the green-red TGDs generated by T be defined as 
before. Let T be Q^^G some Q & Q and suppose b € D 
is such that {b,T) g ToDo{Tq,D). Suppose D' = T(T,b). 
Then {b,QG^ii) (f ToDo{Tq,V') 

Proof: The necessary condition for (&, Q^^G'j to be in 
ToDo(7q,D) is that V \= R{Q)(b). Since V' is a su¬ 
perstructure of D we also have V \= R{Q){b). But the 
necessary condition for (b, QG^^) to be in ToDo(7q, D) is 
that D'^ R{Q)(b). □ 

Of course both the lemma and its proof also hold for the 
colors reversed. 


OUTLINE 

The rest of this paper is devoted to the proof of of Theorem 
|4] The proof is by encoding the word problem for some very 
specific Thue systems over a very specific alphabet (being 
a subset of) Ag¬ 
in Section |IV] we study s-piders, which are elements of 
the set Ag, and s-pider queries Fg which are partial functions 
from Ag to Ag. 

Then, in Section |V] and later, we show how to concatenate 
s-piders into words, and how to modify Fg to get functions 
that take a parr of elements of Ag as an input, and output 
pairs of elements of Ag. This opens the way to Thue systems 
encoding. 


IV. S-PIDERS AND GRAPH REACHABILITY 

Let s G N be fixed and let S be a signature consisting of: 

- constants ci, C 2 ,... Cg and ,... c® 

- binary relation symbols Gi, G 2 ,... Gg and 

G^, G^,... G® (the G reads as ’’calf” here) 

- binary relation symbols Ti,T 2 , .. .Tg and 

yi, ... T® (the T reads as ’’thigh”) 

- ternary relation symbol H. 

Eg, S/{ and S are defined as in Section Hill 

For an element a of a structure D over E by out-degree 
of a we mean the number of atoms P{a,b), with P € T, 
and 6 g D, which are true in D. The in-degree is defined 
in an analogous way. By out-degree of a with respect to P, 
with P g EUE we mean the number of atoms P{a, b), with 
6 g D, which are true in D. 

From now on i,j are always natural numbers from the 
set S = {l,2,...s}. Another notation we use is I, J C 
S which always mean either a singleton or the empty set. 













Being computer scientists, we do not distinguish between a 
singleton and its only element. 

A. S-piders and their taxonomy 

For a conjunction of atomic formulas rh and for an atom 
P (atoms P, P') occurring in rh let tlr/P (resp. rh /P, P’) be 
rb with P (resp. P and P') removed from the conjunction. 

Let now be defined as the following conjunction of 
atomic formulas: 

H{z,zi,Z2) a At=i'^iiz,Xi) A T\z,yi) A Ci{xi,Ci) A 

Definition 6: • The ideal green full s-pider, denoted as ☆, 
is A[G(<I>s)] - the canonical structure of the green version of 
thg. The ideal red full s-pider r, denoted as ★, is yl[i?($s)]. 

• An ideal green 1-lame upper s-pider, denoted ☆*, 

is A[G(<I>s/G®(yi, c*)) A i?(G®(?/i, c*))]. An ideal red 1- 
lame upper s-pider, denoted ★*, is A[i?($g/G*(j/i,c*)) A 

G{CAy^.cA)]. 

• An ideal green 1-lame lower s-pider, denoted ☆,, 

is A[G{^s/Ci{xi,Ci)) A R{Gi{xi,Ci))]. An ideal red 1- 
lame lower s-pider, denoted ★,, is A\R{^s/Gi{xi,Ci)) A 
G(G,(:r„Ci))]. 

• An ideal green 2-lame s-pider, denoted ☆*, is 

A[G{^s/Gi{xi,Ci),GAyj,G))AR{C^{xi,c^)ACAyj,G))]■ 

• An ideal red 2-lame s-pider, denoted ★*, is 

A[R{^s/Cfix^,Ci),GAyj,cA)AG{Gi{xi,cf)ACAyj,G))]. 

Notice that each of ideal s-piders really looks exactly like 
a spider: there is a head (z), with 2s legs attached to it; 
each leg has length 2, and the legs are distinguishable. Head 
is connected to the tail (zi) and the antenna {z 2 )- But the 
antenna and the tail will not bother us in this Section. 

Full s-piders - red and green - are monochromatic, head 
and all legs must be of the same color. 1-lame s-piders have 
one calf of the opposite color. As we distinguish between 
the ’’upper” and ’’lower” legs of a s-pider, we have two kinds 
of 1-lame s-piders of each color. A 2-lame s-pider has one 
upper calf and one lower calf of the opposite color. Any of 
the 2 s vertices of a s-pider which are neither head nor a 
constant will be called a knee. Sometimes we will need to 
be more precise, and talking about particular s-pider we will 
use descriptions like ”i’th upper knee”, hoping that meaning 
of it is clear. 

Definition 7: Ag is the set of all -kj and ☆j, with I and J 
as defined above. A s-pider kj (or iij) is called upper if I 
is non-empty and is called lower if J is non-empty. 

In other words Ag is the set of all ideal s-piders: full, 1- 
lame and 2-lame, both red and green. Notice that a 1-lame 
s-pider is always either upper or lower, a 2-lame s-pider is 
both, and a full s-pider is neither upper nor lower. 

'While ideal s-piders are finitely many (2 -f 4s -f 2s^ of 
them), for each structure over E there can be many - maybe 


infinitely many - actual incarnations of ideal s-piders in this 
structure: 

Definition 8: A real s-pider is any structure S (in particular 
a substructure of another structure) such that: 

* dalt{S) \= <I>g, 

• if S' is a proper substructure of S' then dalt{S') ^ <l>g. 

The second condition of the above definition looks more 
complicated than it really is. We just do not want a house 
full of s-piders to be called a s-pider. 

B. S-pider queries and what they are good for. 

Let us first remind the reader that for each structure D, and 
each subset V of Dom(D) there exists a unique conjunctive 
query T* such that V is the set of free variables of 4* and 
ID) = A[T']: 

Definition 9 (s-pider queries): 1) fj is the unique query 

with free variables Xj and yi whose canonical struc¬ 
ture is equal to A[^s/Cj{xj,Cj),G''{yi,c'')]; 

2) f* is the unique query with single free variable yi 
whose canonical structure is equal to A[4>s/G*(2/i, c®)] 

3) fi is the unique query with single free vari¬ 
able Xi whose canonical structure is equal to 

A[$g/Gi(xi,Ci)] 

By analogy with s-piders, the s-pider queries of the form 
fj will be sometimes called 2-lame, and of the form ff* or fj 
will be called 1-lame. And, like 1-lame s-piders, also 1-lame 
s-pider queries are either upper and lower, while 2-lame are 
both. Let Fg be the set of all s-pider queries. Let us now 
learn - by examples - how the green-red TGDs generated 
by the queries from Fg act on elements of Ag. 

Example 1. Suppose Q consist of a single query f* for some 
i,j and let 7g be the set of TGDs, as defined in Section 
imi Let us try to understand how the TGDs of Tq can be 
applied to ★*. 

Tq consists of two TGDs. One of them is 

It tries to find, in the current structure, a homomorphic 
image V of A[i?(f*)]) and, if this succeeds, it: 

- produces a fresh copy of A[G(fj)] and 

- identifies elements of this copy resulting from free 
variables of G(fj) with elements of D resulting from 
the respective free variable^ in 

The other TGD, does the same, but with the 

colors reversed. 

Now, if the current structure is ★*, which is red, then of 
course the only possible match is with The s-pider 

is lame, it lacks his upper Lth calo but it is not needed 
for a match since G(fj) lacks this calf too. 

^Of course the constants from the language are seen as free variables 
here, and their different occurrences are also identified. 

Actually it has one, but green, and the atoms in the body of any TGD 
of the forni ai'e red, so they only can match with red atoms. 


Thus a new - green - copy G(A[(fj)]) of will be 

created. How will it be connected to the original ? 

Of course all the constants from G{A[{j]) (which means 
all constants from S apart from c* and Cj) will be identified 
with the respective constants in ★*. Also the i-th upper knee 
of G{A[{’j]) will be identified with the respective knee of 
and the j-th lower knee of G(A[fj]) will be identified with 
the respective knee of ★*. Notice that, while G(A[ffj]) is not 
a s-pider (it is two calves short of being one), we actually 
created a new s-pider. It consists of the copy of G(A[f*]), 
and of two calves that it shares with ★*; the z-th upper calf of 
★* (which is green) and of the lower j-th calf of (which 
is red, and is the only red calf of the new s-pider). Not only 
we created a new s-pider but we already have a name for 
it - it is a copy of l We cannot resist the temptation of 
writing this as; 

Example 2. Let now Q consist of a single query fj and 
consider a s-pider with k ^ j. Since \ is green, there 
is of course no match with the left hand side of the TGD 
there a match with the left hand side of 
? Notice that the atom G{Gk{xk,Ck)) occurs in 
G(fj). But not in - the fc’th lower calf of is red. We 
cannot resist the temptation of writing this as: 


if k j then \ ^ Dom{fj) 


Example 3. Let again Q be the single query fj for some i,j- 
We already know that £](★*) = or, to be more precise, 
that Ghase{TQ,*^) \= ☆j. By exactly the same argument 
we get that: 

• Ghase{TQ,'kj) ^ 

• Ghase{TQ,ii^) |= ★j, 

• Ghase{TQ,i^j) \= ★*. 


Definition 10: Suppose S, S' G Ag are such that S S'. 
Let X G {G, i?} be the of color of S and let Y be the oppo¬ 
site color. Let T be the TGD: X{f)(w, u) => 3v Y (/)(u, u). 
Then by f(S) = S' we will mean that: 

• iS 1= {X(f)){b) for some tuple b of elements of S; 

• S' is a substructure ofT{S,b) 

In other words, f{S) = S' means that one of the two 
green-red TGDs generated by / can be applied to S and 
that a copy of S' is then produced in one step. It is of 
course easy to see that the color of S' is then Y. 

The examples of the previous subsection can be easily 
extended to a proof of: 


Lemma 11 (Algebra of s-piders): Let I,J.,I',J' G S be as 
before. Then f 7 (★_;/) is defined if and only if I' G I and 
,/' C J. If this the case then fj{*ji) = 

The same holds for the colors reversed. 


C. Example: Encoding graph reachability 

As one more toy example, consider an undirected graph 
{V,E) , with V = {vi,V 2 ,.. .vtj and G = {ei,e 2 ,.. .e*/}. 
Suppose - for simplicity of presentation - that degree of vi 
is exactly 1 , and that ei is the only edge containing vi. 

Let s G N be such that s > t and s > t' and let the set Q 
contain the following s-pider queries: fi, and, for each 
triple i,j,k such that = {vi,Vj}, two queries: ff|, and ff-J. 

Now we can represent graph reachability as an instance 
of GRCQDP; 

Observation 12: The two conditions are equivalent: 

(i) There is a path, in {V,E), from vi to V 2 ', 

(ii) G/iase(7g, *) 1= ★ 

For the proofl suppose that there is a path 
ui, ei, Uij, Cij ... Ui,, , U 2 from vi to V 2 . One can 

see that Chase{EQ,i^) \= ★ contains the following s- 
piders; (produced from ☆ by fi), **1 (produced from 
by fj^), and so on. For each vertex Vk reachable from 
vi the green 1 -lame upper s-pider will be at some point 
added to the chase and for each edge Ck reachable from 
vi the red 1-lame lower s-pider -kf. will be added. Finally, 
once we have the query can be used to produce ★. 

But wait: how about the opposite direction? Clearly, the 
queries of Q were designed to only produce the red s-spiders 
for reachable edges and green for reachable vertices (as 
above) but how are we sure that there are no side-effects 
leading to the creation of ★ even if V 2 is not reachable from 
Vil There could be two sources of such side-effects. One is 
that - due to the complicated structure of G/iase(7Q, ☆) new 
real s-piders could emerge ther^l, which were not produced 
as f{S), for f G Q and S previously in Ghase{TQ,l?). 
This could in principle happen, the s-piders share constants, 
and knees and who knows what more. 

Second possible source of problems is that some weird 
application of queries from Q to the s-piders we ourselves 
produced could lead to creation of something more than 
just the representations of reachable vertices and edges (as 
described above). 

As it turns out - and as we are going to show before the 
end of this Section - there are no side-effects of the first sort 
and while there indeed are some side-effects of the second 
sort, but they are ’’sterile” and thus controllable. 

D. Understanding the structure of Chase(TQ,ii) 

We want to make sure that our abstraction of low-level 
structures, like s-piders and TGDs, as high-level objects, as 
symbols Ag and partial functions /j : Ag —Ag is correct, in 
the sense that there are no uncontrollable side-effects. And 
this is what the following series of lemmas is about. 

^Remember, this is an example, so the goal is to see the mechanisms 
rather than a rigorous proof. 

® To be more precise, what we really fear here are not new s-piders but 
new - unintended - matchings with left hand side of some TGD from 7g ■ 
See Lemma [HKiv). 


Let Q C Fg. We are going to analyze the structure 
of Chase{TQ,i^). Let Let {'Dk}k£n be a fair sequence 
(with respect to Tq and ☆) and recall (see Section III-Cl l 
that Chase{TQ,*) is defined as basic proof 

technique will be induction on k. 

Lemma 13: Each knee in Chase^ipQ^ii) has out-degree 1. 
Each red head has out-degree 1 with respect to any red 
and to any red T® and out-degree 0 with respect to any other 
relation. The same is true, with colors reversed, for green 
heads. 

Proof: Induction. For the induction step notice that atoms of 
relations Ci, C\ T^, T* can only be created by the TGDs 
from Tq together with their leftmost argument. This means 
that an application of a rule from 7r can never add an out¬ 
going edge to an already existing element (notice that this 
is not true about in-coming edges, and this why s-piders can 
share a calf). □ 

Lemma 14 (No low-level side-effects): Suppose 
an element a in Chase{TQ,ii) is such that 
Chase(TQ,l^) \=H{a, 01 , 02 ), for some ai, 02 . Then: 

(i) There exists exactly one real s-pider S in Chose{TQ, 1^) 
such that a is the head of S. 

(ii) S is created together with a, which means that if T> G 
{Tk}keQ such that o € Dom(T>) then T) \= S. 

(Hi) There is an S' € Ag such that S and S' are isomorphic, 
(iv) Suppose / £ Fg and h is a homomorphism from 
A[R{f)] (or A[G(^f)]) to ChaseiffQ, ☆) such that h{z) = a. 
Then there exists a homomorphism from A[R{f)] (resp. 
A[G{f)]) to S'. 

The sense of Claim (iii) is that a priori S could have 
more than just two calves of the color that is opposite to the 
color of its head, and then, still being a real s-pider, it would 
not be isomorphic to anything in Ag. Proof of the Lemma 
(which we skip) is straightforward induction, using Lemma 

Lemma 15: Let zoo{Q) be the set of all s-piders 5 £ Ag 
which are isomorphic to some real s-pider in Chose{TQ, 
Then zoo{Q) is the smallest subset of As containing ☆ and 
closed under functions from Q. 

Proof: We know, from LemmafTTIthat if 5' = f{S) for some 
f € Q and some <S £ Ag then Chase{TQ,li) |= S'. To see 
that zoo{Q) is closed under functions from Q notice that if 
Chase{TQ,li) |= S then Chase{TQ,S) is a substructure of 
Chase{TQ,li) and, in consequence, also Ghase(frQ,iz) |= 
S'. For the opposite direction use Lemma fT4l □ 

One more lemma we will need is; 

Lemma 16: Suppose each query in Q is lower. Then a s- 
pider in zoo{Q) is red if and only it is lower. 

Proof: By usual induction on the fair sequence. 


E. Idempotence and sterile s-piders 

Notice that it very well may be the case that more than 
one copy of some 5 £ Ag will be created in Chase{Tj^, ☆). 

Let for example Q be {fi,f 2 , , ff|}. Then a copy of 

can be constructed by first applying fi to ☆, and then to 
the resulting But a different copy of will be produced 
by first applying £2 to ☆, and then ff to the resulting * 2 - 

Imagine however that, after constructing (a copy S of) 
☆g in the first way, as ffg(fi(^)) we try to apply to 
S. According to Lemma [TT] it is of course possible, and 
the result is a copy of ★g. But it is not a new copy: it 
follows from Lemma|5]that second consecutive use of TGDs 
generated by the same query does not add to the Chase. In 
this context the following lemma will be particularly useful: 

Lemma 17 ( 2-lame s-piders are sterile): Suppose S is a 
real 2-lame s-pider in some stage T> of Chase{TQ, A). Then 
S will never be used as a left hand side of a TGD execution 
leading to one of the later stages. 

Proof: Suppose S is isomorphic to (the proof does not 
change if S is green). It follows from Lemma [TT] that S 
could only be a result of applying the TGD to 

☆ . But the only TGD that matches with S is By 

Lemma |5] it cannot however be now applied. □ 

Notice that, whenever we have Q containing 2-lame 
queries, like in Subsection IIV-CI some sterile 2-lame red 
s-piders will appear in Ghase{TQ,ii). 

OUTLINE 

The queries in Fg (or functions, depending on what level 
of abstraction one wants to see them) which we considered 
so far were unary, in the sense that they acted on single s- 
piders. In the rest of the paper we want them to be binary, so 
that they can rewrite words from A* in a context-sensitive 
way. And the ability to encode such a rewriting is a key to 
undecidability. 


V. Binary queries 

We will now define two operations - X and f - each of 
them taking two queries from Fg and returning new ’’binary” 
query, from the set that we will call Fg. 

It is maybe good to recall here what are the free variables 
of the s-pider queries from Fg: 2-lame queries have two free 
variables, and 1-lame queries have one: the free variables are 
the knees of the legs with missing calves. When a s-pider 
query / is seen as a green-red TGD, the free variables are 
what connects the new part of the structure, added by a 
single execution of a TGD, to the the old parfl 

Definition 18: For f, f £ Fg consider disjoint copies G of 
A[f] and G' of A[f']. Let V and V' be the sets of elements 

^They of course also connect via the constants. 










of G and G' and let W and W be the subsets of V and V 
consisting of free variables of f and f. Let Zi and zi be 
the antenna and tail of f and let Z 2 and z[ be the antenna 
and tail of f. Let U{f,f') be the (disjoint) union of G and 
G'. Then: 

• f K f ihe unique conjunctive query whose canonical 
structure is U{f, f), with Z 2 and z'^ identified, and with the 
set of free variables equal to W W U {zi, z^}; 

• f Y f' unique conjunctive query whose canonical 

structure is U{f, f), with Zi and z'l identified, and with the 
set of free variables equal to W W U {z 2 , zf}- 

The set of all f X f (or f Y f) for /, f G IFs will be 
called (resp. Fj). We also define F^ as F^ U Fj. 

The main lemma, obviously implying Theorem |4] is; 

Lemma 19: It is an undecidable problem to determine, for 
given s G N and given Q whether Chase(TQ, ^ 

The rest of the paper is devoted to the proof of this lemma. 

VI. Abstracting from the s-pider details 

Let Q C F^ be a set of binary queries. We would 
like to understand the structure of Chase(7g, *) so that, in 
particular, we understand when Chase(7Q,A) |= ★ holds. 

First of all notice that Lemma [13] and [T4| survive in the 
new context - together with their proofs. But be careful here: 

Lemma 20: For each pair ai , 02 of elements of 
Chase(TQ,iz) there are at most two elements a such 
that Chase(TQ,li) |=/f(a, Oi, 02 ). 

Proof: Induction. For the induction step notice that an atom 
H(a, Oi, 02 ) can only be created together with either a new 
element oi (if a TGD generated by a query from is used) 
or with a new 02 (when the query is from F^). And notice 
that a single execution of a TGD generated by a query from 
F^ creates two atoms of the predicate H, and the newly 
created oi occurs in both of them. Notice that the newly 
created spiders are always both of the same color. □ 

A. Queries f\ X /2 cind fi Y /2 in action 

Let now Q € Q be of the forir|§ fi X /2 and suppose D 
is a structure (a stage of Chase(7Q, *)). 

Consider the TGD and let us try to imagine how 

this TGD could be executed in D. First a homomorphism h 
from A[R(^Q)] to D needs to be found. 

A[i?((5)] contains 3 antenna/tail vertices: Zi,z[ and Z 2 , 
joined by the atoms i?(H)(z, Zi, Z 2 ) and i?(H)(z', zj, Z 2 ). 
This means that two red atoms i?(H)(/i(z), h{zi), h{z 2 )) and 
i?(H)(/i(z'), h(zi), h{z 2 )) must be found in D. 

Notice that, due to Lemma [^ once h{zi), h{z[) and 
h{z 2 ) are fixed there are at most two possible choices for 
each of h{z) and h(z'). And once h(z) and h{z') are fixed 
then, due to Lemma [T4| there is exactly one real s-pider Ni 

*The case when Q is of the form fi Y /2 is analogous. 


in D with i?(H)(/i(z), h{zi), h{z 2 )) and exactly one real s- 
pider ^2 in D with i?(H)(Ii(z'), /i(zj), h(z 2 )) 

Now, in order for the query A[i?((5)] to be executed we 
need the query R{fi) to match with and the query R{f 2 ) 
to match with 82 - Lemma [TT] tells us when it is possible. 

Once the triples h{z), h{zi), h{z 2 ) and 

h{z'),h(z[),h{z 2 ), satisfying all the above constraints, are 
found, a copy of A[G{Q)] is createcJE consisting of two 
greer0 s-piders fi{Si) and / 2 (<S 2 ). This is because on the 
level of individual s-piders we are exactly in the world of 
Section |IV] 

What is new is how the two new s-piders are connected 
to each other and to the old part of the structure; the antenna 
of /i(<Si) is a new element - it was quantified in Q - and 
is identified with the antenna of / 2 (*S 2 ). 

But the tail of fi{Si) was free in Q so it is identified with 
h{zi) and the tail of / 2 (‘? 2 ) was free and it is identified with 
h(z 2 ). So the new copy of A[G(Q)\ is connected to the old 
structure via the tails of the two new s-piders. 

Of course the two new s-piders are also connected to the 
old structure via the free variables (and constants) which 
are not in their H atoms. But there are two reasons why we 
do not need to bother about it. First of them is Lemma [T4| 
Second is that, while each TGD generated by a query from 
Fg needs two spiders to be executed, and requires them to 
share their antennas (or tails), it is oblivious to any other 
possible connections between the two s-piders (via knees). 
This analysis shows that we now can completely abstract 
from the low-level implementation details of the s-piders, in 
particular from details like the relations C®, Gi, T®, and 
concentrate on the high-level notions. 

B. S-warm and s-warm rewriting rules 

A s-warm is defined as a multi-labeled graph (which 
means that each edge can have one or more labels), whose 
edges are the H atoms of some structure (intended to be a 
stage of Chase(7Q, ☆)); 

Definition 21: A s-warm D is a ternary relation HC x 
D X D. To keep notations light we use the term ’’elements 
of I}” for elements of D. Elements of Ag are labels. We 
assume that for each two elements a,b of a s-warm there 
are at most two s-piders S such that H{S, a, b), and that 
they are of the same color. Atoms H{S, a, b), or just pairs 
a, b, such that D |=//(<S, a, b), for some S, are called edges. 
An edge is green or red, depending on the s-piders being its 
labels. 

We are going to see queries from F^ as s-warm rewriting 
rules: 

Definition 22: Let Q = f X f (or Q = f Y f') be from F^ 
and let Ih be a s-warm. We say that a rewriting Q can be 
executed in D if: 

^Unless it already existed. 

'®Of course this is all true also for the colors reversed. 


© there are edges H{S^a,b) and H(S',a',b) (resp. 
H{S,a,b) and H{S', a,b')), such that S € Dom(f) and 
S' S Dom{f') are both of the same color; 

0 there is no b' (resp. a') such that H[f{S),a,b') and 
H{f'{S'),a',b') (resp. //(/(<S), o', &) and H{f'{S'),a',b')) 
are edges o/D. 

Pair of edges H{S, a, b) and H{S' , a', b) is called the input 
of the rewriting (notice that order is important here). The re¬ 
sult of the rewriting is then a new structure D' being D with 
new vertex b' (resp. a') and new edges H{f{S)., a,b') and 
H{f'iS'),a',b') (resp. H{f{S),a',b) and H{f'{S'),a',b')) 
as above. 

Notice that we did not require in the above definition that 
a a' (resp. b b'). Not only we have no means to enforce 
such requirement, but also, since we begin the Chase from 
a single (full green) s-pider the possibility of having them 
equal is of crucial importance for us. 

See that - while we are not literally talking about TGDs 
now - conditions © and 0 are analogous to O and © from 
Section III-BI and we can still (like in Section III-Cl l define a 
fair (with respect to a set Q of rewritings and an original 
s-warm D) sequence of structures {Cfcjfcgo, with each Ck 
being a result of a single execution of a rewriting rule 
in the structure Uz<fc with each possible rewriting 

ultimately being executed. We can also define the fixpoint 
of the rewritings, as the union of Ufcen ^k- To distinguish, 
we will call the union c/iase(Q,D). 

C. The abstraction 

Let D be a structure over E, such that if D ^H(a, b, c) 
then there is exactly one real s-pider 5 in D having a as 
its head, and such that each real s-pider in D is isomorphic 
to some element of A. The following definition and lemma 
hardly come as a surprise: 

Definition 23: The s-warm s-wann{I}) is defined as the set 
of all triples H{S, b, c) such that D \=H{a, b, c) and a is the 
head of a real s-pider in D which is isomorphic to S. 

From now on let be the s-warm consisting of a single 
edge labeled with ☆. Define J- as the set of all fair (with 
respect to the set Tg of TGDs and the structure ☆) sequences 
{Dfcjfcgn and let (F' be the set of all fair (with respect to 
the set Q of rewriting rules and the s-warm D.^) sequences 
{CfcjfcGn- 

Lemma 24: The mapping that maps a sequence {Dfcjfcgn 
of structures to a sequence {s-warm{Dk)}k^n of s-wanns 
is a bijection from T to T'. 

For a given set Q L let C® = chase(Q,D^). From 
now on we forget about Chase{TQ,'h) and TGDs and 
concentrate on s-warms and their rewritings. Due to Lemma 
l24l in order to prove Lemma [19] it is enough to show: 

Lemma 25; It is an undecidable problem to determine, for 
given s € N and given set Q C o/ rewriting rules, 


whether C® contains any edge labeled with ★. 

D. One more lemma 

Before we end this Section it will be maybe illuminating 
to notice one peculiar property of C®. The proof of the 
following lemma goes by easy induction: 

Lemma 26: Let Q C F^. Then each vertex of either has 
in-degree zero (such vertex will be called tail, as it is the 
tail of all the edges it belongs to) or has out-degree zero 
(and it is the antenna of all the edges it belongs to). This 
implies that all the directed H-paths in C® have length one. 

Now imagine vertices of C® drawn in two rows - all the 
antennas in the upper row and all the tails in the lower one 
- and see how mnemonic the fonts X and Y are. 

VII. An important example (quite complicated) 

Consider a set consisting of the following three pairs 
of associated rewritings: 

® A: fi X h and B: ff X 

@ A: X h and B: X f? 

(D A: Y fe and B: Y C 

where a, /3o, ?7o, /3i, i?i S S, 

And let us try to have a glimpse of C®''. Let H(*, soXo) 
be the only edge of . 

The table describes a (finite prefix of) an infinite sequence 
of rewritings that will be of special importance for us. Newly 
created elements are marked with bold. 


Input 

edges 

Rule 

used 

Output 

edges 

H(^,so,G), H(^,so,G) 

H(*i,So,T), H(*2,So,T) 

, So, ii), !!(☆, So, to) 

H(★5,s',^l), H(*6 ,s',G) 
H(^™,si,to), H(^,so,G) 

H(*3,Si,T'), H(*4,So,T') 

, So, G), !!(☆, So, to) 

®A 

®B 

@A 

@B 

@A 

@B 

@A 

H(*i,so,t'), H(* 2 ,so,t') 
H(i;r“,so,ti), H(*’'i,so,ti) 
H(*g,s',ii), H(*6 ,s',G) 
H(^^Xsi,G), H('*™,si,io) 

H(* 3 ,Sl,t"), H(* 4 ,So,t") 

H(i;r^Xsi,t2), H(i;r’>Lso,t2) 
H(*5 ,s",G), H(*6,s",G) 


Compare the two rewritings using the rule (DA and notice 
the recursion. Then proving the following lemma will be an 
easy exercise: 

Lemma 27: There are infinite sequences and 

Si,S 2 ,... of elements of such that: //(☆“, sQ) fi)> 
for each k there is , Sk,tk) and H(t!^°, Sk,tk+i) and 

VIII. Friendly Thue systems 

We will now consider Thue systems If C S* x §*. 
Elements of § are numbers, so some of them are even and 
some are odd. We think that a, /3o and rjQ are even and /3i 
and pi are odd. 

A set of productions of a Thue system If C S* x §* will 
be called friendly if If = n< U 11= where: 

• n< consists of two pairs {rjo , (OoVi} and {pi,/3ipo}; 










• all productions of 11= are of the form {ij, i'j'} for 

e §; 

• if {ij,i'j'} G n= then both i and i' are odd and both j 
and j' are even or both i and i' are even and both j and j' 
are odd; 

• there is no production of 11= of the form {ij, ij'} or 

• there is no production involving a; no production in 11= 
involves r]Q or r]i, 

• there is an odd 7 € S, and even 7' G S which occur (each 
of them) in exactly one production of If, which is {ii', 77'} 
for some i, z' G §; 

• s>2|n| 

It is easy to prove (using the techniques presented in 
IIDav77l ) that the problem: 


For a set of productions of a friendly Thue system 11, 
do there exist w,w' G S* such that 'Wff'w' 77n arjil 


is undecidable. 

Let now 11 be a fixed friendly Thue system. Lemma l25l 
and therefore Theorem IH will be proved once we construct 
a set Q of rewritings such that the two conditions are 
equivalent: 

©: There is an edge, in C® labeled with ★; 

©: W77'w' o.'ili for some w,w' G S*. 

The following Lemma is easy to prove and will be useful: 

Lemma 28: Condition 9 holds if and only if there L m G N 
such that W77'w' ctC/Si/So)™??! for some w, w' G S*. 

A. The set Q 

First we define Qq, as the set of rewritings consisting of: 

- all the rewriting rules from 

- two associated rewriting rules @ fj X fj f 

for each production p = {ij, i'j'} in 11 with i even; 

- two associated rewriting rules (D fj Y Hp ^nd fj Y Hp 
for each production p = {ij, i'j'} in n with i odd; 

where each of the numbers Ip and each Cp only occurs in 
the two aforementioned rewriting rules. Finally, Q is defined 
as Qq with one additional rewriting rule ® ff'*' Y with r 
not occurring anywhere in the rules of Qq. 


OUTLINE 

The rest of the paper is devoted to understanding the 
structure, first of C®” = chase{Qo,V)^) and then of 
C® = chase{Q, ) in order to prove that © holds true if 
and only of ® does. In the following Section IIXI we prove 
that condition © implies ©. 


IX. How TO HUNT A (FULL) RED S-PIDER 

Definition 29: For a s-warm D the set W (D) C S* X is 
defined as the smallest set such that: 

• {e, a,a) CW (D) for each a G D; 

• if {w, a, b) G W (D) and D |= ,b,b') for some even 

i then {wi,a,b') G lX(D); 

• if (w, a,b) & W (D) and D b', b) for some odd i 

then {wi,a,b') G I^(D). 

Then W (D) C S* is defined as {ry : 3 a, b {w, a, b) G 
M^(D)}. 

In other words W (D) is the set of all words that can be 
constructed as follows: walk an undirected path in D (form 
some a to some b) and read (and remember) the labels of 
all the edges you cross. But you are only allowed to take 
edges labeled by green 1 -lame s-piders. And if the label is 
☆*, for an even i, then you must walk in the direction of the 
edge, otherwise you must walk in the opposite direction. 

Lemma 30: If v G IL(C®“) and v <©>n= v' then also v' G 
VF(C®«). 

Proof: By induction it is enough to prove that if v G 
W{C^°) and v <©n= v' then also v' G kF(C®“). Suppose 
that V = WizjW2, that v' = wiz'j'w2 with {ij, i'j'} G n= 
and i even (the other case is analogous) and that v G 
IF(C®“). The last assumption means that there are vertices 
a,h,c,d,e in C®“ such that both the triples (wi,a, &) 
and {•W 2 ,d,e) are in W{C^°) and edges c) and 

are in C®“. 

By the assumption that {ij,i'j'} G n= and i even we 
have that the rewritings X Hp X Q- 

But - since C®” is closed under rewritings - the first of 
these rewritings enforces that there must be a c' in C®“ 
with H(*; ,b,c!) and d, c'). And the second of the 

rewritings enforces that there must be a c" in C®“ with 
and H(^j',d,c"). So also v' G kF(C®'>). □ 

Lemma 31: Condition © implies ©. 

Proof: By Lemma |28] condition © implies that there is m G 
N such that W77'w' <©>n= Q:(/ 3 i/ 3 o)’"' 7 i for some w,w' G 

It follows from Lemma |27] and from Definition |29] that 
ct(/ 3 i/ 3 o)™??i G IF(C®“) for some m G N. By Lemma 
this implies that the word 77' is in IF(C®“), which means 
that there are vertices a,a',b of C®“ such that H(*^,a, &) 
and H(*'^ , a, b') hold in C®°. Now use the rule Y ^7 to 
produce an edge labeled with ★. □ 

OUTLINE 

Now we only need to prove that condition © implies ©. It 
is much more complicated than the opposite implication. 

In the rest of the paper we assume that © does not hold 
true. Our goal is to show that © is not true either. The 
plan is to first consider a sequence {Ci}i^u (where w is the 











first infinite ordinal) fair (with respect to Qo and and 
analyze the structure This will be done in 

Sections IXlfXlIl 

Then we must of course face the possibility that the 
list ToDo((I),C®“) will be very much non-empty and many 
(infinitely many) further rewritings may be needed. But - as 
we are going to prove in Section IXIIIl - all the edges created 
by these rewritings will be sterile. 


X. Getting rid of the reds 

First of all notice that all the rewritings used in Qo are 
loweQ The proof of the following lemma is by (easy) 
induction, almost the same as the proof of Lemma [T6l 

Lemma 32: Let S be the label of some edge in C®“. Then 
S is red if and only if it is lower. In particular ★ is not a 
label of any edge in C®“. Also \ cannot be a label of any 
edge in (where r is from rule ®J. 

Definition 33: Two red edges H{S, a, b) and H{S', a', b) (or 
H{S, a, b) and H{S', a, b')) of will be called a married 
couple if they were created in the same rewriting step. The 
vertex b (resp. a) is called a knot then. 

As it turns out, a knot is never touched by any edge other 
than the two spouses it joins: 

Lemma 34: If a is a knot then it has degree 2 in C®”. 

Proof: Suppose the knot b was created, together with red 
edges H(5,a, 6) and by an execution of some 

rule f/ X f/ (the case with f/ Y is analogous). This rule 
was applied to two green edges with labels and thus 

5 = and 5' = 

^ J 

The only way to create a new edge containing the vertex 
b, would be to find some edge H(iSi, a, b') (or H(<Si, a', 6')) 
and use a rule of the form f Y f'. But if / occurs in any 
rule from Fj then it cannot be applied to any s-pider of the 
form - this is because of the assumption that each of the 
lower subscripts can only occur in two associated rewritings. 

□ 

Lemma 35 (No children out of wedlock, whatever temptation): 
Suppose H[S, a, b) is an element of a married couple of 
reds in some Cn, created by an execution of some rule f 
from Qq. Then the only way for it to be a part of the input 
of any future rewriting by a rule g from Qq is that the other 
element of the input of this rewriting is its spouse and that 
g is the rule associated with f. 

Proof: Suppose / G Fj (the other case is analogous), so a 
is the knot joining H(5, a, b) with its spouse. It follows from 
Lemma [3] that p G Fj - otherwise the degree of a would 
be greater than 2 at some point. So the only way for an 
edge to be an input of a rewriting together with H(5, a, 6) 

**It is not true about Q and this is the reason why we analyze first. 


is to contain the vertex a. But a only belongs to two edges 
in C®”: to H(<S,a, 6) and its spouse. Using the argument 
from the proof of Lemma [31 that the numbers Ip and Vp 
can only occur in two associated rewritings, we get that g 
is either / itself (which is impossible due to idempotence) 
or is associated with /. □ 

Notice (and this remark will be needed in Section IXIIIb 
that the proof does not rely on the shortage of possible 
candidates who would be keen to produce offspring with 
H(iS,a, 6). The reasons for its faithfulness is inherent to 
H(iS, a, b) itself, and even someone like H(*, c, b) would not 
change its mind (★ being the most promiscuous red label). 

Lemma 36 (Sterile reds): (i) If a red H{S,a,b) is an ele¬ 
ment of a married couple of red edges in some Cn and S is 
2-lame then neither H{S, a, b) nor its spouse are never used 
as an input of a rewriting rule execution. 

(ii) A rewriting in which ☆ is used as an input of any 2-lame 
rewriting rule from F^ leads to a pair of sterile red edges. 

Proof of this Lemma is left as an easy exercise. Use the 
assumption that there is no production in II of the form 
{ij, ij'} or {ijfi'j} the argument from the proof of 
Lemma [TtI 

From now on we assume that the sequence {CfcjfcgN is 
such, that whenever a married couple of reds is created 
at some step, at the next step the only rewriting this red 
marriage is able to be the input of is executed (unless this 
married couple is sterile). So we can imagine that we always 
execute procedures consisting of two associated rules, and 
produce greens from other greens. The red edges are in the 
structure but in no way they contribute to its complexity and 
we do not need to think of them any more. 

XL Dangerous vertices 

In our quest to understand the structure of we 
now concentrate on the green edges. We already know 
(from Lemma [31l that 2-lame rewritings applied to ☆ never 
produce anything relevant. Notice also that all the rewritings 
used in Qq are lower, and all green edges of C®°, apart 
from edges labeled with ☆, are upper. This means that only 
2-lame rewriting rules can be applied to edges with labels 
of the form ☆*. 

In particular this means that any rewriting with the rule 
©A must take as its input two edges labeled with ☆, 
rewritings @A and @A take one edge labeled with ☆ and 
one with and so on. 

Definition 37: A vertex of C, or any Ci, is called dangerous 
if it is a tail or an antenna of some edge labeled with ☆. 

Lemma 38: Let H{ii'',a,b) be a green edge of Ci. Then (i) 
a is dangerous if and only if i is either a or po and (ii) b 
is dangerous if and only if i is rji. 

Proof: By induction on i. The claim is clearly true in Cq as 
it consists of a single edge labeled with ☆. 










Suppose the claim is true in some C„. Suppose the 
structure Cn +2 is a result of first applying some rewriting / 
to green edges H(5i, a, b) and H(<S 2 , o', b), (or to H(<Si, a, b) 

and H(<S 2 , a, b') -in cases where rewriting rules (D or (D 

were used), creating two new red edges, and then applying 
/', associated with /, to the two new red edges (we know, 
from Section]^ that this is the only scenario one needs to 
consider). 

As a result a new vertex b' (resp. a') is created, together 
with new green edges 6 ') and H(<S 2 , o', 6 '), (resp. 

H(<S(, o', b) and H( 52 , o', h')). We need to check that a and 
a' (resp. b and b') do not become dangerous in Cn +2 (if 
they were not in C„) and that the new edges and new vertex 
do not contradict the claim. There are now, unfortunately, 8 
cases we need to inspect, depending on / and /': 

•/,/' of the form @. Then each of Si, S 2 , S 2 is of 
the form for some * ^ a, po, pi. By assumption none of 
a,a', bis dangerous in and they remain non-dangerous in 
Cn+ 2 - The new b' is non-dangerous either. The claim holds 
in Cn+2- 

•/, /' of the form ©. Analogous to the previous case. 

•/, /' of the form ®A, ®B. Then = ^2 = ☆ so a, a' 

and b are all dangerous. 5^ = and S 2 = and the 

new b' is non-dangerous in Cn+2- The claim holds in Cn+2- 

•/, /' of the form ®B, ®A. Then = ☆“ and ^2 = 

☆''i so, by assumption, a and a' must have already been 

dangerous in C„. S[ = S 2 = if and so the new b' is created 

as dangerous. The claim holds in Cn+2- 

•/, /' of the form @A, @B. Then and 52 = *. 

By assumption b and a' are dangerous but a is not. 

and 52 = *’'\ so b' is non-dangerous and a remains non- 

dangerous in Cn+2- The claim holds in Cn+2- 

•/, /' of the form @B, @A. 5( = *^° and 52 = *’'^. By 

assumption a' is dangerous while a and b are not. 5^ = *’'“ 

and 52 = * so b' is created as dangerous but a remains 

non-dangerous in Cn+2- The claim holds in Cn+2- 

The two cases with @ are analogous to the cases with ©.□ 

XII. Characterization OF Vk(C®“) 

A word * 0*1 ■ • ■ ii-ik G S* is correct if for all /c 7 ^ 0 there 
is ik ^ a and for a\\ k ^ I there is ik 7 ^ ? 7 o and ik ^ rji. A 
word * 0 * 1 .. .*i-i*/ G S* is maximal correct if it is correct 
and *0 = a and ii = 770 or *; = 771 

Lemma 39: (i) For each correct w £ Vk(C®°) there is a 
maximal correct v £ W(C^°) such that w is a subword of 

V. 

(ii) IfvG kk(C 2 «) is maximal correct then v ctVi- 

Proof: It is enough to prove that both claims hold in each 
W{Cn), and this can be proved by induction. The claim is 
clearly true in Cq as W{Co} is empty. The induction step 
follows the proof of Lemma [3^ and similar case inspection 
is needed. For /, /' of the form @ apply the argument from 
the proof of Lemma 


Also for both the @ and both the @ cases the validity of 
the induction hypothesis for n + 2 follows from the assump¬ 
tion about W{Cn) and the fact that the word w £ W{Cn+ 2 ) 
under consideration is a result of one rewriting, using one 
of the rules from n<, and applied to some word in C„. 

In the case of /, /' of the form ®B, ®A no new words 
are added to W{Cn+ 2 )- Finally, in the case of f, f' of the 
form ®A, ®B one new correct word is created'T It is arji, 
which clearly satisfies both the claims of the Lemma. □ 

Now it easily follows from Lemma that: 

Lemma 40: No edge in C®” is labeled with ☆'*' or with *'•' 
XIII. From C®“ to C® 

For each pair of edge^ of the form H(*, a, b), H(*, a, b') 
in C®“ let us now apply a rewriting using the rule ®. Each 
such rewriting will result in adding a new vertex a' and 
new edges and H(*,7 ,a',b'). Call the resulting 

structure C. Notice that all the new vertices of C are of degree 
2 . 

Proof of Lemma[3T] and thus of Lemmal25]and Theorem 
m will he completed once we show: 

Lemma 41: (i) ToDo{Q,C) is empty. In consequence C=C^. 
(ii) There is no edge labeled with ★ in C. 

Proof: Claim (ii) is obvious - there was no such edge in 
C®“ and we never added one while building C on the top of 
C®. For the proof of Claim (i) first notice that no rewriting 
with green inputs is possible in C: 

-no such rewriting using rules from Qo is possible since 
C has no new green edges compared to C®° and 

- no such rewriting using rule ® and at least one 1-lame 
green edge is possible, since no edge of C®° is labeled with 

or ☆'>' (Lemma l40l). and 

- no such rewriting using rule ® and both inputs labeled 
with ☆ is possible any more - by the definition of C. 

Now how about the possibility of rewritings in C using 
red edges as the input? No such rewriting using rules of Qo 
and having, as the input, at least one red edge from C®“ 
is possible, by Lemma (however tempting the new red 
edges would look!). By Lemmaneither ff'*' nor match 
with any red edge from from C®“. This means that no red 
edge from C®° can be an input of any new rewriting in C. 

To finish the proof notice that none of the rewritings from 
fa can use either or as one of its inputs. Since all 
the new edges in C are of degree 2 the only rule from Fj 
that matches with the new edges of C is CD, which however 
cannot be used due to idempotence reasons. □ 

^^What is actually created is a new copy of this word. 

In fornial terms this means that we extend the fair sequence {Cn}n^uj 
with new stmctures. We do not rely on that so we do not need to prove it, but 
there are infinitely many of the new structures, as there are infinitely many 
edges in labeled with ☆. Thus the new fair sequence is {Cn}n€ 2 uj 
The stnacture C is now - as always - defined as UnG 2 cj 
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